Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario

نویسندگان

چکیده

Often a face has voice. Appearance sometimes strong relationship with one's In this work, we study how can be converted to voice, which is face-based voice conversion. Since there no clean dataset that contains and speech, conversion faces difficult learning low-quality problems caused by background noise or echo. Too much redundant information for face-to-voice also causes synthesis of general style speech. Furthermore, previous work tried disentangle speech bottleneck adjustment. However, it hard decide on the size bottleneck. Therefore, propose bottleneck-free strategy disentanglement. To avoid synthesizing utilize framewise facial embedding. It applied adversarial multi-scale discriminator model achieve better quality. addition, self-attention module added focus content-related features in-the-wild data. Quantitative experiments show our method outperforms work.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Analysis – Synthesis Based on the Ptdft for Voice Conversion

Voice conversion problem became very popular in the world. It has applications in many fields, for example in systems that make use of prerecorded speech, such as voice mailboxes or text-to-speech synthesizers based on acoustic unit concatenation. In such cases, voice modification would be a simple and efficient way to create a desired variety of voices while avoiding recording of different spe...

متن کامل

GMM-based voice conversion applied to emotional speech synthesis

Voice conversion method is applied to synthesizing emotional speech from standard reading (neutral) speech. Pairs of neutral speech and emotional speech are used for conversion rule training. The conversion adopts GMM (Gaussian Mixture Model) with DFW (Dynamic Frequency Warping). We also adopt STRAIGHT, the high-quality speech analysis-synthesis algorithm. As conversion target emotions, (Hot) a...

متن کامل

Emotional Speech Synthesis Based on Improved Codebook Mapping Voice Conversion

This paper presents a spectral transformation method for emotional speech synthesis based on voice conversion framework. Three emotions are studied, including anger, happiness and sadness. For the sake of high naturalness, superior speech quality and emotion expressiveness, our original STASC system is modified by introducing a new feature selection strategy and hierarchical codebook mapping pr...

متن کامل

Evaluation of VTLN-based voice conversion for embedded speech synthesis

Recently, we demonstrated that vocal tract length normalization (VTLN) can be applied to voice conversion tasks. In particular, when the conversion algorithm is performed in time domain, this technique is very resource-efficient and, consequently, suitable for embedded applications. In this paper, we use VTLNbased voice conversion as a novel feature of a small footprint speech synthesizer runni...

متن کامل

Voice characteristics conversion for HMM-based speech synthesis system

In this paper, we describe an approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system. Since this speech synthesis system uses phoneme HMMs as speech units, voice characteristics conversion is achieved by changing HMM parameters appropriately. To transform the voice characteristics of synthesized speech to the target speaker, we applied MAP/VFS algorithm to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i11.26607